Unknown Malcode Detection Using OPCODE Representation

نویسندگان

  • Robert Moskovitch
  • Clint Feher
  • Nir Tzachar
  • Eugene Berger
  • Marina Gitelman
  • Shlomi Dolev
  • Yuval Elovici
چکیده

The recent growth in network usage has motivated the creation of new malicious code for various purposes, including economic ones. Today’s signature-based anti-viruses are very accurate, but cannot detect new malicious code. Recently, classification algorithms were employed successfully for the detection of unknown malicious code. However, most of the studies use byte sequence n-grams representation of the binary code of the executables. We propose the use of (Operation Code) OpCodes, generated by disassembling the executables. We then use n-grams of the OpCodes as features for the classification process. We present a full methodology for the detection of unknown malicious code, based on text categorization concepts. We performed an extensive evaluation of a test collection of more than 30,000 files, in which we evaluated extensively the OpCode n-gram representation and investigated the imbalance problem, referring to real-life scenarios, in which the malicious file content is expected to be about 10% of the total files. Our results indicate that greater than 99% accuracy can be achieved through the use of a training set that has a malicious file percentage lower than 15%, which is higher than in our previous experience with byte sequence n-gram representation [1].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Opcode sequences as representation of executables for data-mining-based unknown malware detection

Malware can be defined as any type of malicious code that has the potential to harm a computer or network. The volume of malware is growing faster every year and poses a serious global security threat. Consequently, malware detection has become a critical topic in computer security. Currently, signature-based detection is the most widespread method used in commercial antivirus. In spite of the ...

متن کامل

Study of Dataset Feature Filtering of OpCode for Malware Detection Using SVM Training Phase

Malware can be defined as any type of malicious code that has the potential to harm a computer or network. To detect unknown malware families, the frequency of the appearance of Opcode (Operation Code) sequences are used through dynamic analysis. Opcode n-gram analysis used to extract features from the inspected files. Opcode n-grams are used as features during the classification process with t...

متن کامل

Opcode-Sequence-Based Semi-supervised Unknown Malware Detection

Malware is any computer software potentially harmful to both computers and networks. The amount of malware is growing every year and poses a serious global security threat. Signature-based detection is the most extended method in commercial antivirus software, however, it consistently fails to detect new malware. Supervised machine learning has been adopted to solve this issue, but the usefulne...

متن کامل

Using opcode sequences in single-class learning to detect unknown malware

Malware is any type of malicious code that has the potential to harm a computer or network. The volume of malware is growing at a faster rate every year and poses a serious global security threat. Although signaturebased detection is the most widespread method used in commercial antivirus programs, it consistently fails to detect new malware. Supervised machinelearning models have been used to ...

متن کامل

Fileprint analysis for Malware Detection

June 19, 2005 1 Review Draft Fileprint analysis for Malware Detection Salvatore J. Stolfo, Ke Wang, Wei-Jen Li Columbia University Abstract Malcode can be easily hidden in document files and embedded in application executables. We demonstrate this opportunity of stealthy malcode insertion in several experiments using a standard COTS Anti-Virus (AV) scanner. In the case of zero-day malicious exp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008